Makkah Weather Analysis

Introduction

This is an exploratory study to find out if there is a change in temperature in Makkah The last decade. The study is part of a developmental awareness series on YouTube presented by Eng. Ahmed H. Alfi'er Alshareef in Arabic.

Part 1: Data Acquisition and Exploring

In [1]:
import numpy as np
import re
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sktime.utils.plotting.forecasting import plot_ys

import warnings

warnings.filterwarnings("ignore")

Content:

Introduction

Part 1: Data Acquisition and Exploring:

0- Acquiring Data

1- Load Data

2- Clean Data

3- Exploratory Analysis

(Total number of questions is 5)

  • Pattern of Temperature Daily, Monthly and yearly (Minimum, Maximum and Average)
  • Maximum and Minimum Recorded Temperatures from 2009 to 2020
  • Average Temperature, Wind Speed, Humidity And Barometer Each Year Grouped By Month
  • Temperature, Wind Speed, Humidity And Barometer Distributions
  • Weather Condition Count And Its Percentage

4- Save Data

Part 2: Modeling

1- Load Data

2- Features Selection

  • Group Data Daily, /monthliy and Yearly and Extraxt Min, Max and Ave.
  • Removing Sesonality

3- Models Selection

  • Check Different Models To Forcast Minimum Temperatures
    • Hyper-parameters Tuning
  • Check Different Models To Forcast Maximum Temperatures
    • Hyper-parameters Tuning

4- Conclusion

===================================

0- Acquiring Data

This data was collected from TimeAndDate website from September 10th 2009 to September 29th 2020 .

www.timeanddate.com

1- Load Data

In [2]:
makkah_weather = pd.read_csv('../data/makkah_weather.csv')
makkah_weather.head()
Out[2]:
city date hour temperature condition wind humidity barometer visibility
0 makkah 11-09-09 00:00:00 30.0 Clear 7 0.62 1005.0 16�
1 makkah 11-09-09 01:00:00 28.0 Clear No wind 0.66 1004.0 16�
2 makkah 11-09-09 02:00:00 27.0 Clear 7 0.66 1004.0 16�
3 makkah 11-09-09 03:00:00 27.0 Clear 7 0.66 1004.0 16�
4 makkah 11-09-09 04:00:00 27.0 Clear No wind 0.66 1004.0 16�

2- Clean Data

In [3]:
# Create datetime object and drop hour
makkah_weather['date'] = makkah_weather.date +' '+ makkah_weather.hour
makkah_weather['date'] = pd.to_datetime(makkah_weather['date'],  dayfirst=True, errors='coerce')
makkah_weather.drop('hour', axis=1, inplace=True)
makkah_weather.head()
Out[3]:
city date temperature condition wind humidity barometer visibility
0 makkah 2009-09-11 00:00:00 30.0 Clear 7 0.62 1005.0 16�
1 makkah 2009-09-11 01:00:00 28.0 Clear No wind 0.66 1004.0 16�
2 makkah 2009-09-11 02:00:00 27.0 Clear 7 0.66 1004.0 16�
3 makkah 2009-09-11 03:00:00 27.0 Clear 7 0.66 1004.0 16�
4 makkah 2009-09-11 04:00:00 27.0 Clear No wind 0.66 1004.0 16�
In [4]:
makkah_weather.visibility = makkah_weather.visibility.str.split('�').str[0]
makkah_weather.head()
Out[4]:
city date temperature condition wind humidity barometer visibility
0 makkah 2009-09-11 00:00:00 30.0 Clear 7 0.62 1005.0 16
1 makkah 2009-09-11 01:00:00 28.0 Clear No wind 0.66 1004.0 16
2 makkah 2009-09-11 02:00:00 27.0 Clear 7 0.66 1004.0 16
3 makkah 2009-09-11 03:00:00 27.0 Clear 7 0.66 1004.0 16
4 makkah 2009-09-11 04:00:00 27.0 Clear No wind 0.66 1004.0 16
In [5]:
makkah_weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95722 entries, 0 to 95721
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   city         95722 non-null  object        
 1   date         95722 non-null  datetime64[ns]
 2   temperature  95717 non-null  float64       
 3   condition    95722 non-null  object        
 4   wind         95706 non-null  object        
 5   humidity     95716 non-null  float64       
 6   barometer    95710 non-null  float64       
 7   visibility   74851 non-null  object        
dtypes: datetime64[ns](1), float64(3), object(4)
memory usage: 5.8+ MB
In [6]:
print(makkah_weather.visibility.unique())
['16' nan '8' '1' '6' '0' '5' '4' '9' '7' '3' '2' '14' '53' '10']
In [7]:
print(makkah_weather.wind.unique())
['7' 'No wind' '6' '19' '9' '11' '15' '22' '26' '24' '33' '30' '28' '13'
 '4' '20' '32' '17' '41' nan '37' '35' '46' '2' '43' '44' '48' '39' '82'
 '74' '56' '182' '52' '50' '163' '102' '54' '21' '18']
In [8]:
print(makkah_weather.temperature.unique())
[30. 28. 27. 26. 25. 32. 34. 35. 36. 33. 31. 29. 37. 38. 39. 24. 23. 22.
 40. 41. 20. 21. 19. 18. 17. 16. 15. 43. 45. 46. 42. nan 44. 48. 50. 47.
 49. 51. 14. 13.]
In [9]:
makkah_weather = makkah_weather.replace({'wind':'No wind'},0)
In [10]:
# Change data type for columns
convert_dict = {'visibility':float, 'wind':float, 'temperature':float} 
makkah_weather = makkah_weather.astype(convert_dict) 
makkah_weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95722 entries, 0 to 95721
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   city         95722 non-null  object        
 1   date         95722 non-null  datetime64[ns]
 2   temperature  95717 non-null  float64       
 3   condition    95722 non-null  object        
 4   wind         95706 non-null  float64       
 5   humidity     95716 non-null  float64       
 6   barometer    95710 non-null  float64       
 7   visibility   74851 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(2)
memory usage: 5.8+ MB
In [11]:
print(makkah_weather.isna().sum())
city               0
date               0
temperature        5
condition          0
wind              16
humidity           6
barometer         12
visibility     20871
dtype: int64
In [12]:
# Fill nan values by averaging the before & after values for temperature, wind, humidity, visibility and barometer
makkah_weather.temperature = (makkah_weather.temperature.ffill()+makkah_weather.temperature.bfill())/2
makkah_weather.wind = (makkah_weather.wind.ffill()+makkah_weather.wind.bfill())/2
makkah_weather.humidity = (makkah_weather.humidity.ffill()+makkah_weather.humidity.bfill())/2
makkah_weather.barometer = (makkah_weather.barometer.ffill()+makkah_weather.barometer.bfill())/2
makkah_weather.visibility = (makkah_weather.visibility.ffill()+makkah_weather.visibility.bfill())/2
In [13]:
makkah_weather.isna().sum()
Out[13]:
city           0
date           0
temperature    0
condition      0
wind           0
humidity       0
barometer      0
visibility     0
dtype: int64
In [14]:
makkah_weather.groupby('condition')['condition'].count()
Out[14]:
condition
"                          13
Broken clouds             187
Clear                   37894
Dense fog                  18
Drizzle                     4
Duststorm                 122
Extremely hot              12
Fog                        69
Haze                       40
Hot                         2
Light rain                 10
Low level haze            142
Mild                        5
More clouds than sun        1
Overcast                    8
Partly cloudy              80
Partly sunny              804
Passing clouds          17125
Pleasantly warm             4
Rain                       41
Sandstorm                 337
Scattered clouds         4578
Smoke                       7
Sunny                   33793
Thundershowers             40
Thunderstorms             376
Warm                       10
Name: condition, dtype: int64
In [15]:
makkah_weather = makkah_weather.replace({'condition':'"'},'Clear')
print(makkah_weather.groupby('condition')['condition'].count())
condition
Broken clouds             187
Clear                   37907
Dense fog                  18
Drizzle                     4
Duststorm                 122
Extremely hot              12
Fog                        69
Haze                       40
Hot                         2
Light rain                 10
Low level haze            142
Mild                        5
More clouds than sun        1
Overcast                    8
Partly cloudy              80
Partly sunny              804
Passing clouds          17125
Pleasantly warm             4
Rain                       41
Sandstorm                 337
Scattered clouds         4578
Smoke                       7
Sunny                   33793
Thundershowers             40
Thunderstorms             376
Warm                       10
Name: condition, dtype: int64
In [16]:
makkah_weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95722 entries, 0 to 95721
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   city         95722 non-null  object        
 1   date         95722 non-null  datetime64[ns]
 2   temperature  95722 non-null  float64       
 3   condition    95722 non-null  object        
 4   wind         95722 non-null  float64       
 5   humidity     95722 non-null  float64       
 6   barometer    95722 non-null  float64       
 7   visibility   95722 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(2)
memory usage: 5.8+ MB
In [17]:
# Daily max , min & mean temp.
d_temp = makkah_weather.groupby([makkah_weather['date'].dt.date])['temperature'].agg(['min','max','mean'])

# Monthly max , min & mean temp.
m_temp = makkah_weather.groupby([makkah_weather['date'].dt.year, makkah_weather['date'].dt.month
                                ])['temperature'].agg(['min','max','mean'])

# yearly max , min & mean temp.
y_temp = makkah_weather.groupby(makkah_weather['date'].dt.year)['temperature'].agg(['min','max','mean'])

3- Exploratory Analysis

In [18]:
# Credit: https://stackoverflow.com/questions/43214978/seaborn-barplot-displaying-values
def show_values_on_bars(axs, h_v="v", space=0.4):
    def _show_on_single_plot(ax):
        if h_v == "v":
            for p in ax.patches:
                if ~np.isnan(p.get_height()):
                    value = float('{:.2f}'.format(p.get_height()))
                    if value >= 0:
                        _x = p.get_x() + p.get_width() / 2
                        _y = p.get_y() + p.get_height() + float(space)
                    else:
                        _x = p.get_x() + p.get_width() / 2
                        _y = p.get_y() + p.get_height() - float(space)
                    ax.text(_x, _y, value, ha="center") 
        elif h_v == "h":
            for p in ax.patches:
                if ~np.isnan(p.get_width()):
                    value = float('{:.2f}'.format(p.get_width()))
                    if value >= 0:
                        _x = p.get_x() + p.get_width() + float(space)
                        _y = p.get_y() + p.get_height()
                    else:
                        _x = p.get_x() + p.get_width() - float(space)
                        _y = p.get_y() + p.get_height()
                        
                    ax.text(_x, _y, value, ha="left")

    if isinstance(axs, np.ndarray):
        for idx, ax in np.ndenumerate(axs):
            _show_on_single_plot(ax)
    else:
        _show_on_single_plot(axs)
In [19]:
XSMALL_SIZE = 12
SMALL_SIZE = 14
MEDIUM_SIZE = 16
BIGGER_SIZE = 10

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=SMALL_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=XSMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=XSMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=XSMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title
In [20]:
sns.pairplot(makkah_weather);

1- Pattern of Temperature Daily, Monthly and yearly (Minimum, Maximum and Average)

In [21]:
plot_ys(makkah_weather.temperature);
plt.title('$Makkah$ Tempreture from 2009-2020', fontsize=16);
plt.ylabel('$Tempreture$', fontsize=14);
findfont: Font family ['DejaVu Sans Display'] not found. Falling back to DejaVu Sans.
In [22]:
# Daily temp. changes
plot_ys(d_temp.reset_index()['max'], d_temp.reset_index()['min'], d_temp.reset_index()['mean'], labels=['max', 'min','ave']);
plt.title('$Makkah$ Daily Max , Min & Ave Tempretures from 2009-2020', fontsize=16);
plt.ylabel('$Tempreture$', fontsize=14);
In [23]:
# Monthly temp. changes
plot_ys(m_temp.reset_index(drop=True)['max'], m_temp.reset_index(drop=True)['min'], m_temp.reset_index(drop=True)['mean'], labels=['max', 'min','ave']);
plt.title('$Makkah$ Monthly Tempretures from 2009-2020', fontsize=16);
plt.ylabel('$Tempreture$', fontsize=14);
In [24]:
# Yearly temp. changes
plot_ys(y_temp.reset_index(drop=True)['max'], y_temp.reset_index(drop=True)['min'], y_temp.reset_index(drop=True)['mean'] ,labels=['max', 'min','ave']);
plt.title('$Makkah$ Yearly Tempretures from 2009-2020', fontsize=16);
plt.ylabel('$Tempreture$', fontsize=14);
In [25]:
# # Yearly temp. changes
plt.figure(figsize=(16,6))
sns.barplot(x=y_temp.index, y=y_temp.reset_index(drop=True)['mean'],palette='summer');
plt.title('$Makkah$ Yearly Average Tempretures from 2009-2020', fontsize=16);
plt.ylabel('$Tempreture$', fontsize=14);
plt.xticks(rotation=90);

2- Maximum and Minimum Recorded Temperatures from 2009 to 2020

In [26]:
mkh_df = makkah_weather.drop(['visibility',
                              'city',
                              'condition'],
                             axis=1).groupby([makkah_weather.date.dt.year,
                                             makkah_weather.date.dt.month],
                                             as_index=False).agg(['min','max','mean'])
In [27]:
print('Maximum Recorded Temperature from 2009 -2020:')
print(mkh_df['temperature']['max'][mkh_df['temperature']['max'] == mkh_df['temperature']['max'].max()])
Maximum Recorded Temperature from 2009 -2020:
date  date
2010  6       51.0
Name: max, dtype: float64
In [28]:
print('Minimum Recorded Temperature from 2009 -2020:')
print(mkh_df['temperature']['min'][mkh_df['temperature']['min'] == mkh_df['temperature']['min'].min()])
Minimum Recorded Temperature from 2009 -2020:
date  date
2015  1       13.0
Name: min, dtype: float64

3- Average Temperature, Wind Speed, Humidity And Barometer Each Year Grouped By Month

In [29]:
def get_mean(month, typ):
    # f1 = mkh_df.index.get_level_values(0) == 2020
    f2 = mkh_df.index.get_level_values(1) == month
    c = mkh_df.iloc[(f2)]

    look_up = {'1': 'January', '2': 'February ', '3': 'March', '4': 'April ', '5': 'May',
                '6': 'June', '7': 'July', '8': 'August ', '9': 'September',
               '10': 'October ', '11': 'November ', '12': 'December'}

    nam = look_up[str(month)]
    # d = pd.melt(c['temperature'], value_vars=['mean'])
    # d = d.rename(columns={'value': 'Temperature'})

    c['year'] = np.tile(c.index.get_level_values(0), 1)
    plt.figure(figsize=(16,6));
    aa = sns.barplot(data=c, x='year',y=c[typ]['mean'].values, palette='summer_r');
    
    sns.set_style("white");
    
    plt.title('$Average$ {} for {} from 2009-2020'.format(typ,nam));
    plt.ylabel(typ);
    
    if typ == 'temperature':
        plt.ylim(0, max(c[typ]['mean'].values)+4);
        show_values_on_bars(aa, h_v="v", space=1)
    elif typ == 'humidity':
        plt.ylim(0, max(c[typ]['mean'].values)+1);
        show_values_on_bars(aa, h_v="v", space=0.2)
    elif typ == 'barometer':
        plt.ylim(0, max(c[typ]['mean'].values)+110);
        show_values_on_bars(aa, h_v="v", space=10)
    else:
        plt.ylim(0, max(c[typ]['mean'].values)+2);
        show_values_on_bars(aa, h_v="v", space=1)
    
    plt.figure(figsize=(16,6));
    dum = c[typ]['mean'].values - c[typ]['mean'].values.mean() 
    aa = sns.barplot(data=c, x='year',y= dum, palette='summer_r');
    
    
    plt.title('$Average$ {} After Removing the Mean for {} from 2009-2020'.format(typ,nam)); 
    
    if typ == 'temperature':
        plt.ylim(min(dum)-1, max(dum)+1);
        show_values_on_bars(aa, h_v="v", space=0.3)
    elif typ == 'humidity':
        plt.ylim(min(dum)-0.5, max(dum)+0.5);
        show_values_on_bars(aa, h_v="v", space=0.1)
    else:
        plt.ylim(min(dum)-1, max(dum)+1);
        show_values_on_bars(aa, h_v="v", space=0.3)
    
    plt.figure(figsize=(16,6));
    plt.title('Average {} $Trend$ for {} from 2009-2020'.format(typ,nam));
    sns.regplot(data=c, x='year',y=c[typ]['mean'].values,fit_reg=True,color='gray',line_kws={'linestyle':'--'}, 
                 ci=None,scatter_kws={'s':180, 'color':'#00943e'});
    plt.ylabel(typ);
In [30]:
for m in range(1,13):
    get_mean(m, 'temperature')
findfont: Font family ['DejaVu Sans Display'] not found. Falling back to DejaVu Sans.
In [31]:
for m in range(1,13):
    get_mean(m, 'humidity')
In [32]:
for m in range(1,13):
    get_mean(m, 'wind')
In [33]:
for m in range(1,13):
    get_mean(m, 'barometer')

4- Temperature, Wind Speed, Humidity And Barometer Distributions

In [34]:
plt.figure(figsize=(16,6));
sns.distplot(makkah_weather.temperature);
plt.title('Makkah $Temperature$ Distribution from 2009-2020', fontsize=16);
In [35]:
plt.figure(figsize=(16,6));
sns.distplot(makkah_weather.humidity);
plt.title('Makkah $Humidity$ Distribution from 2009-2020', fontsize=16);
In [36]:
plt.figure(figsize=(16,6));
sns.distplot(makkah_weather.barometer);
plt.title('Makkah $Barometer$ Distribution from 2009-2020', fontsize=16);
In [37]:
plt.figure(figsize=(16,6));
sns.distplot(makkah_weather.wind);
plt.title('Makkah $Wind$ Distribution from 2009-2020', fontsize=16);

5- Weather Condition Count And Its Percentage

In [38]:
cond = pd.DataFrame(makkah_weather.groupby('condition')['condition'].count())
# print(makkah_weather.groupby('condition')['condition'].count().sum())
cond['ratio'] = makkah_weather.groupby('condition')['condition'].count()/makkah_weather.groupby('condition')['condition'].count().sum()
In [39]:
plt.figure(figsize=(16,6))
cond = cond.sort_values(by='ratio', ascending=False);
aa = sns.barplot(x=cond.index, y=cond.ratio*100, palette='summer');
show_values_on_bars(aa, h_v="v", space=1)
plt.xticks(rotation=90);
plt.title('Makkah $Condition$ Count in % from 2009 to 2020', fontsize=16);
plt.ylim(0,43);
In [40]:
print('Weather condition count totally:\n')
print(cond.condition.sort_values(ascending=False))
Weather condition count totally:

condition
Clear                   37907
Sunny                   33793
Passing clouds          17125
Scattered clouds         4578
Partly sunny              804
Thunderstorms             376
Sandstorm                 337
Broken clouds             187
Low level haze            142
Duststorm                 122
Partly cloudy              80
Fog                        69
Rain                       41
Haze                       40
Thundershowers             40
Dense fog                  18
Extremely hot              12
Warm                       10
Light rain                 10
Overcast                    8
Smoke                       7
Mild                        5
Pleasantly warm             4
Drizzle                     4
Hot                         2
More clouds than sun        1
Name: condition, dtype: int64
In [41]:
ig = cond[cond.condition > 15000]
tg = cond.loc[ig.index]

# Pie chart
labels = ig.index 

fig1, ax1 = plt.subplots(figsize=(16,6));
ax1.pie(tg.ratio, labels=labels, autopct='%1.1f%%', startangle=90, explode=[0.01,0.01,0.01]);

#draw circle
centre_circle = plt.Circle((0,0),0.30,fc='white');
fig = plt.gcf();
fig.gca().add_artist(centre_circle);

# Equal aspect ratio ensures that pie is drawn as a circle
ax1.axis('equal');
plt.tight_layout();
plt.title('Makkah $Condition$ Count in % from 2009 to 2020', fontsize=16);
plt.show();

4- Save Data

In [42]:
makkah_weather.head()
Out[42]:
city date temperature condition wind humidity barometer visibility
0 makkah 2009-09-11 00:00:00 30.0 Clear 7.0 0.62 1005.0 16.0
1 makkah 2009-09-11 01:00:00 28.0 Clear 0.0 0.66 1004.0 16.0
2 makkah 2009-09-11 02:00:00 27.0 Clear 7.0 0.66 1004.0 16.0
3 makkah 2009-09-11 03:00:00 27.0 Clear 7.0 0.66 1004.0 16.0
4 makkah 2009-09-11 04:00:00 27.0 Clear 0.0 0.66 1004.0 16.0
In [43]:
makkah_weather.to_csv('../data/cleand.csv', index=False)

Part 2: Modeling

In [44]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
# pd.set_option('display.max_rows', None)

# ================================================

from datetime import datetime
from datetime import timedelta
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from time import time
import tensorflow as tf
from tensorflow import keras

from sklearn.utils import shuffle
from sklearn.model_selection import GridSearchCV
from sklearn.feature_selection import VarianceThreshold, SelectFromModel, RFE

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, make_scorer

from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor ,RandomForestRegressor
from sklearn.linear_model import LinearRegression, ElasticNet



from sktime.forecasting.model_selection import SlidingWindowSplitter, ForecastingGridSearchCV
from sktime.forecasting.compose import EnsembleForecaster
from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformers.single_series.detrend import Deseasonalizer, Detrender
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.compose import ReducedRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.utils.plotting.forecasting import plot_ys
from sktime.transformers.single_series.detrend import ConditionalDeseasonalizer
from sktime.regression.compose import TimeSeriesForestRegressor

1- Load Data

In [45]:
# Sesonality test
""" If p-value < 0.05 , then data is not sesonal"""
def perform_adf_test(series):
    result = adfuller(series)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
In [46]:
makkah_weather = pd.read_csv('../data/cleand.csv', parse_dates=['date'])
makkah_weather.head()
Out[46]:
city date temperature condition wind humidity barometer visibility
0 makkah 2009-09-11 00:00:00 30.0 Clear 7.0 0.62 1005.0 16.0
1 makkah 2009-09-11 01:00:00 28.0 Clear 0.0 0.66 1004.0 16.0
2 makkah 2009-09-11 02:00:00 27.0 Clear 7.0 0.66 1004.0 16.0
3 makkah 2009-09-11 03:00:00 27.0 Clear 7.0 0.66 1004.0 16.0
4 makkah 2009-09-11 04:00:00 27.0 Clear 0.0 0.66 1004.0 16.0
In [47]:
# Limit the data to start from 2016
s = pd.Timestamp(year=2016, month=1, day=1)
makkah_weather = makkah_weather[makkah_weather.date >= s].reset_index(drop=True)
makkah_weather.head()
Out[47]:
city date temperature condition wind humidity barometer visibility
0 makkah 2016-01-01 00:00:00 21.0 Passing clouds 7.0 0.64 1015.0 16.0
1 makkah 2016-01-01 01:00:00 20.0 Passing clouds 7.0 0.60 1014.0 16.0
2 makkah 2016-01-01 02:00:00 20.0 Passing clouds 11.0 0.60 1014.0 16.0
3 makkah 2016-01-01 03:00:00 20.0 Passing clouds 11.0 0.56 1014.0 16.0
4 makkah 2016-01-01 04:00:00 20.0 Passing clouds 11.0 0.56 1014.0 16.0

2- Features Selection

1- Group Data Daily, monthliy and Yearly and Extraxt Min, Max and Ave.

In [48]:
# Daily max , min & mean data
d_temp = makkah_weather.groupby([makkah_weather['date'].dt.date]).agg(['min','max','mean'])

# Monthly max , min & mean data
m_temp = makkah_weather.groupby([makkah_weather['date'].dt.year, makkah_weather['date'].dt.month
                                ]).agg(['min','max','mean'])

# yearly max , min & mean data
y_temp = makkah_weather.groupby(makkah_weather['date'].dt.year).agg(['min','max','mean'])
In [49]:
# Daily max and min temperature
dmax = d_temp['temperature']['max'].reset_index(drop=True)
dmin = d_temp['temperature']['min'].reset_index(drop=True)

# yearly max and min temperature
mmax = m_temp['temperature']['max'].reset_index(drop=True)
mmin = m_temp['temperature']['min'].reset_index(drop=True)

# yearly max and min temperature
ymax = y_temp['temperature']['max'].reset_index(drop=True)
ymin = y_temp['temperature']['min'].reset_index(drop=True)

2- Removing Sesonality

In [50]:
# Minimum Temperatures
t = Deseasonalizer(model='multiplicative', sp=365)
y_dmin = t.fit_transform(dmin)
plot_ys(dmin, y_dmin,labels=['Actual','Deseasonalized']);
plt.title('Minmium Tempretures 2009 - 2020');
print('Sesonality test for Min temp.')
perform_adf_test(y_dmin)
Sesonality test for Min temp.
ADF Statistic: -7.718420
p-value: 0.000000
In [51]:
# Maximum Temperatures
y_dmax = t.fit_transform(dmax)
plot_ys(dmax, y_dmax,labels=['Actual','Deseasonalized']);
plt.title('Maximum Tempretures 2009 - 2020');
print('Sesonality test for Max temp.')
perform_adf_test(y_dmax)
Sesonality test for Max temp.
ADF Statistic: -13.990028
p-value: 0.000000

3- Models Selection

1- Check Different Models To Forcast Minmum Temperatures

In [52]:
y_train, y_test = temporal_train_test_split(y_dmin, test_size=30)
print(y_train.shape, y_test.shape)
(1704,) (30,)
In [53]:
fh = np.arange(len(y_test)) + 1
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
forecaster.fit(t.fit_transform(y_train))
y_pred = forecaster.predict(fh)
plot_ys(y_test, y_pred, labels=[ "y_test", "y_pred"]);
In [54]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.0514
RMSE 1.69828
Out[54]:
test pred
1704 24.0 25.0
1705 26.0 25.0
1706 28.0 25.0
1707 28.0 25.0
1708 28.0 25.0
1709 25.0 25.0
1710 26.0 25.0
1711 24.0 25.0
1712 26.0 25.0
1713 27.0 25.0
1714 28.0 25.0
1715 27.0 25.0
1716 29.0 25.0
1717 27.0 25.0
1718 25.0 25.0
1719 26.0 25.0
1720 27.0 25.0
1721 26.0 25.0
1722 25.0 25.0
1723 26.0 25.0
1724 26.0 25.0
1725 26.0 25.0
1726 25.0 25.0
1727 26.0 25.0
1728 26.0 25.0
1729 25.0 25.0
1730 26.0 25.0
1731 26.0 25.0
1732 26.0 25.0
1733 26.0 25.0
In [55]:
regressor = KNeighborsRegressor(n_neighbors=5)
forecaster = ReducedRegressionForecaster(regressor=regressor, window_length=12, strategy="recursive")
forecaster.fit(t.fit_transform(y_train))
y_pred = forecaster.predict(fh)
plot_ys(y_test, y_pred, labels=[ "y_test", "y_pred"]);
In [56]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.04597
RMSE 1.44828
Out[56]:
test pred
1704 24.0 26.0
1705 26.0 26.0
1706 28.0 26.0
1707 28.0 26.0
1708 28.0 25.0
1709 25.0 26.0
1710 26.0 26.0
1711 24.0 26.0
1712 26.0 26.0
1713 27.0 26.0
1714 28.0 26.0
1715 27.0 25.0
1716 29.0 25.0
1717 27.0 25.0
1718 25.0 26.0
1719 26.0 25.0
1720 27.0 25.0
1721 26.0 25.0
1722 25.0 25.0
1723 26.0 25.0
1724 26.0 24.0
1725 26.0 25.0
1726 25.0 25.0
1727 26.0 25.0
1728 26.0 25.0
1729 25.0 25.0
1730 26.0 25.0
1731 26.0 25.0
1732 26.0 25.0
1733 26.0 25.0
In [57]:
forecaster = ExponentialSmoothing(seasonal='multiplicative', sp=10) 
param_grid = {'seasonal':['add', 'multiplicative' ],'sp':[30,60,90,120,150,180,
                                                                210,240,270,300,330,360]}

# we fit the forecaster on the initial window, and then use temporal cross-validation to find the optimal parameter
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.5))
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=param_grid)

gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_ys(y_test, y_pred, labels=["y_test", "y_pred"]);

print(gscv.best_params_)
{'seasonal': 'add', 'sp': 90}
In [58]:
forecaster = gscv.best_forecaster_
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_ys( y_test, y_pred, labels=["y_test", "y_pred"]);
In [59]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.04394
RMSE 1.54082
Out[59]:
test pred
1704 24.0 25.0
1705 26.0 26.0
1706 28.0 26.0
1707 28.0 26.0
1708 28.0 25.0
1709 25.0 25.0
1710 26.0 25.0
1711 24.0 26.0
1712 26.0 25.0
1713 27.0 25.0
1714 28.0 25.0
1715 27.0 25.0
1716 29.0 25.0
1717 27.0 26.0
1718 25.0 25.0
1719 26.0 25.0
1720 27.0 25.0
1721 26.0 26.0
1722 25.0 25.0
1723 26.0 25.0
1724 26.0 25.0
1725 26.0 24.0
1726 25.0 25.0
1727 26.0 25.0
1728 26.0 25.0
1729 25.0 25.0
1730 26.0 25.0
1731 26.0 26.0
1732 26.0 25.0
1733 26.0 25.0
In [60]:
li = [LinearRegression(), KNeighborsRegressor()]
forecaster = ReducedRegressionForecaster(regressor=None, window_length=15, strategy="recursive")

param_grid = {'window_length':[4,5,6,7,8,9,10,12,15,20], 'regressor':li}

# we fit the forecaster on the initial window, and then use temporal cross-validation to find the optimal parameter
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.5))
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=param_grid)

gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_ys(y_test, y_pred, labels=["y_test", "y_pred"]);

print(gscv.best_params_)
{'regressor': LinearRegression(), 'window_length': 12}
In [61]:
forecaster_min = gscv.best_forecaster_
forecaster_min.fit(y_train)
y_pred = forecaster_min.predict(fh)
plot_ys( y_test, y_pred, labels=["y_test", "y_pred"]);
In [62]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.0366
RMSE 1.37117
Out[62]:
test pred
1704 24.0 25.0
1705 26.0 25.0
1706 28.0 25.0
1707 28.0 25.0
1708 28.0 25.0
1709 25.0 25.0
1710 26.0 25.0
1711 24.0 25.0
1712 26.0 25.0
1713 27.0 26.0
1714 28.0 25.0
1715 27.0 25.0
1716 29.0 25.0
1717 27.0 26.0
1718 25.0 26.0
1719 26.0 26.0
1720 27.0 26.0
1721 26.0 26.0
1722 25.0 26.0
1723 26.0 26.0
1724 26.0 26.0
1725 26.0 26.0
1726 25.0 26.0
1727 26.0 26.0
1728 26.0 26.0
1729 25.0 26.0
1730 26.0 26.0
1731 26.0 26.0
1732 26.0 26.0
1733 26.0 26.0
In [63]:
sts = StandardScaler()
y = np.array(y_dmin).reshape(-1,1)
yn = sts.fit_transform(y)
In [64]:
def sq(seq, step):
    x, y = list(), list()
    for i in range(len(seq)):
        end_x = i + step
        if end_x  > len(seq) -1:
            break
        seq_x, seq_y = seq[i:end_x], seq[end_x]
        x.append(seq_x)
        y.append(seq_y)
    return np.array(x), np.array(y)

Xs, ys = sq(yn, 10)
In [65]:
Xs = Xs.reshape((Xs.shape[0], Xs.shape[1], 1))
In [66]:
X_train, X_test = temporal_train_test_split(np.array(Xs), test_size=0.2)
y_train, y_test = temporal_train_test_split(np.array(ys), test_size=0.2)

print(X_train.shape, y_train.shape)
(1379, 10, 1) (1379, 1)
In [67]:
model = keras.Sequential()
model.add(
  keras.layers.Bidirectional(
    keras.layers.LSTM(
        activation = 'relu',
      units=64,
        input_shape= X_train.shape
    )
  )
)
model.add(keras.layers.Dropout(rate=0.2))
model.add(keras.layers.Dense(units=1))

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='mean_squared_error', optimizer=opt)
In [68]:
history = model.fit(X_train,
                    y_train,
                    epochs=50,
                    batch_size=32,
                    validation_split=0.1,
                    shuffle=False,
                   verbose = 0)
WARNING:tensorflow:Layer bidirectional is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

In [69]:
plt.figure(figsize=(16,6))
plt.plot(history.history['loss'], label='train');
plt.plot(history.history['val_loss'], label='validation');
plt.legend()
plt.grid()
In [70]:
y_pred = model.predict(X_test);

y_pred = sts.inverse_transform(y_pred)
y_test = sts.inverse_transform(y_test)

y_pred = pd.Series(y_pred.flatten(order='F'))
y_test = pd.Series(y_test.flatten(order='F'))

y_pred = pd.Series(y_pred)
y_test = pd.Series(y_test)

plot_ys(pd.Series(y_test), y_pred, labels=['true','pred']);
In [71]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.05367
RMSE 1.73917
Out[71]:
test pred
0 26.0 26.0
1 27.0 26.0
2 27.0 26.0
3 27.0 26.0
4 27.0 26.0
... ... ...
340 25.0 26.0
341 26.0 26.0
342 26.0 26.0
343 26.0 26.0
344 26.0 26.0

345 rows × 2 columns

2- Check Different Models To Forcast Maximum Temperatures

In [72]:
y_train, y_test = temporal_train_test_split(y_dmax, test_size=30)
print(y_train.shape, y_test.shape)
(1704,) (30,)
In [73]:
fh = np.arange(len(y_test)) + 1
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
forecaster.fit(t.fit_transform(y_train))
y_pred = forecaster.predict(fh)
plot_ys(y_test, y_pred, labels=[ "y_test", "y_pred"]);
In [74]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.06461
RMSE 3.00931
Out[74]:
test pred
1704 36.0 35.0
1705 36.0 34.0
1706 36.0 34.0
1707 35.0 34.0
1708 36.0 34.0
1709 36.0 34.0
1710 34.0 34.0
1711 34.0 34.0
1712 31.0 34.0
1713 34.0 34.0
1714 34.0 34.0
1715 35.0 34.0
1716 39.0 34.0
1717 40.0 34.0
1718 37.0 34.0
1719 34.0 34.0
1720 35.0 34.0
1721 38.0 34.0
1722 39.0 34.0
1723 35.0 34.0
1724 37.0 34.0
1725 37.0 34.0
1726 36.0 34.0
1727 42.0 34.0
1728 39.0 34.0
1729 36.0 34.0
1730 33.0 34.0
1731 36.0 34.0
1732 35.0 34.0
1733 36.0 34.0
In [75]:
regressor = KNeighborsRegressor(n_neighbors=5)
forecaster = ReducedRegressionForecaster(regressor=regressor, window_length=12, strategy="recursive")
forecaster.fit(t.fit_transform(y_train))
y_pred = forecaster.predict(fh)
plot_ys(y_test, y_pred, labels=[ "y_test", "y_pred"]);
In [76]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.07329
RMSE 3.27796
Out[76]:
test pred
1704 36.0 33.0
1705 36.0 34.0
1706 36.0 34.0
1707 35.0 34.0
1708 36.0 33.0
1709 36.0 34.0
1710 34.0 35.0
1711 34.0 34.0
1712 31.0 34.0
1713 34.0 33.0
1714 34.0 35.0
1715 35.0 34.0
1716 39.0 34.0
1717 40.0 35.0
1718 37.0 33.0
1719 34.0 33.0
1720 35.0 33.0
1721 38.0 34.0
1722 39.0 33.0
1723 35.0 34.0
1724 37.0 34.0
1725 37.0 35.0
1726 36.0 34.0
1727 42.0 33.0
1728 39.0 33.0
1729 36.0 33.0
1730 33.0 33.0
1731 36.0 34.0
1732 35.0 34.0
1733 36.0 34.0
In [77]:
forecaster = ExponentialSmoothing(seasonal='multiplicative', sp=10) 
param_grid = {'seasonal':['add', 'multiplicative' ],'sp':[30,60,90,120,150,180,
                                                                210,240,270,300,330,360]}

# we fit the forecaster on the initial window, and then use temporal cross-validation to find the optimal parameter
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.5))
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=param_grid)

gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_ys(y_test, y_pred, labels=["y_test", "y_pred"]);

print(gscv.best_params_)
{'seasonal': 'multiplicative', 'sp': 90}
In [78]:
forecaster_max = gscv.best_forecaster_
forecaster_max.fit(y_train)
y_pred = forecaster_max.predict(fh)
plot_ys( y_test, y_pred, labels=["y_test", "y_pred"]);
In [79]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.04694
RMSE 2.14219
Out[79]:
test pred
1704 36.0 36.0
1705 36.0 38.0
1706 36.0 37.0
1707 35.0 36.0
1708 36.0 36.0
1709 36.0 37.0
1710 34.0 36.0
1711 34.0 37.0
1712 31.0 37.0
1713 34.0 36.0
1714 34.0 37.0
1715 35.0 36.0
1716 39.0 37.0
1717 40.0 37.0
1718 37.0 37.0
1719 34.0 37.0
1720 35.0 37.0
1721 38.0 37.0
1722 39.0 36.0
1723 35.0 36.0
1724 37.0 36.0
1725 37.0 37.0
1726 36.0 37.0
1727 42.0 37.0
1728 39.0 37.0
1729 36.0 36.0
1730 33.0 36.0
1731 36.0 36.0
1732 35.0 36.0
1733 36.0 36.0
In [80]:
li = [LinearRegression(), KNeighborsRegressor()]
forecaster = ReducedRegressionForecaster(regressor=None, window_length=15, strategy="recursive")

param_grid = {'window_length':[4,5,6,7,8,9,10,12,15,20], 'regressor':li}

# we fit the forecaster on the initial window, and then use temporal cross-validation to find the optimal parameter
cv = SlidingWindowSplitter(initial_window=int(len(y_train) * 0.5))
gscv = ForecastingGridSearchCV(forecaster, cv=cv, param_grid=param_grid)

gscv.fit(y_train)
y_pred = gscv.predict(fh)
plot_ys(y_test, y_pred, labels=["y_test", "y_pred"]);

print(gscv.best_params_)
{'regressor': LinearRegression(), 'window_length': 6}
In [81]:
forecaster = gscv.best_forecaster_
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_ys( y_test, y_pred, labels=["y_test", "y_pred"]);
In [82]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.06377
RMSE 2.99907
Out[82]:
test pred
1704 36.0 35.0
1705 36.0 35.0
1706 36.0 34.0
1707 35.0 34.0
1708 36.0 34.0
1709 36.0 34.0
1710 34.0 34.0
1711 34.0 34.0
1712 31.0 34.0
1713 34.0 34.0
1714 34.0 34.0
1715 35.0 34.0
1716 39.0 34.0
1717 40.0 34.0
1718 37.0 34.0
1719 34.0 34.0
1720 35.0 34.0
1721 38.0 34.0
1722 39.0 34.0
1723 35.0 34.0
1724 37.0 34.0
1725 37.0 34.0
1726 36.0 34.0
1727 42.0 34.0
1728 39.0 34.0
1729 36.0 34.0
1730 33.0 34.0
1731 36.0 34.0
1732 35.0 34.0
1733 36.0 34.0
In [83]:
stm = StandardScaler()
y = np.array(y_dmax).reshape(-1,1)
yn = stm.fit_transform(y)
In [84]:
def sq(seq, step):
    x, y = list(), list()
    for i in range(len(seq)):
        end_x = i + step
        if end_x  > len(seq) -1:
            break
        seq_x, seq_y = seq[i:end_x], seq[end_x]
        x.append(seq_x)
        y.append(seq_y)
    return np.array(x), np.array(y)

Xs, ys = sq(yn, 10)
In [85]:
Xs = Xs.reshape((Xs.shape[0], Xs.shape[1], 1))
In [86]:
X_train, X_test = temporal_train_test_split(np.array(Xs), test_size=0.2)
y_train, y_test = temporal_train_test_split(np.array(ys), test_size=0.2)

print(X_train.shape, y_train.shape)
(1379, 10, 1) (1379, 1)
In [87]:
model = keras.Sequential()
model.add(
  keras.layers.Bidirectional(
    keras.layers.LSTM(
        activation = 'relu',
      units=64,
        input_shape= X_train.shape
    )
  )
)
model.add(keras.layers.Dropout(rate=0.2))
model.add(keras.layers.Dense(units=1))

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='mean_squared_error', optimizer=opt)
In [88]:
history = model.fit(X_train,
                    y_train,
                    epochs=50,
                    batch_size=32,
                    validation_split=0.1,
                    shuffle=False,
                   verbose=0)
WARNING:tensorflow:Layer bidirectional_1 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

In [89]:
plt.figure(figsize=(16,6))
plt.plot(history.history['loss'], label='train');
plt.plot(history.history['val_loss'], label='validation');
plt.legend()
plt.grid()
In [90]:
y_pred = model.predict(X_test);

y_pred = stm.inverse_transform(y_pred)
y_test = stm.inverse_transform(y_test)

y_pred = pd.Series(y_pred.flatten(order='F'))
y_test = pd.Series(y_test.flatten(order='F'))

y_pred = pd.Series(y_pred)
y_test = pd.Series(y_test)

plot_ys(pd.Series(y_test), y_pred, labels=['true','pred']);
In [91]:
print('SMAPE',round(smape_loss(y_test, y_pred),5))
print('RMSE', round(np.sqrt(mean_squared_error(y_test, y_pred)),5))
pd.DataFrame({'test':round(y_test) ,'pred':round(y_pred)})
SMAPE 0.04629
RMSE 2.14731
Out[91]:
test pred
0 34.0 34.0
1 33.0 34.0
2 34.0 33.0
3 34.0 34.0
4 35.0 34.0
... ... ...
340 36.0 37.0
341 33.0 34.0
342 36.0 33.0
343 35.0 35.0
344 36.0 35.0

345 rows × 2 columns

4- Conclusion

To simplify the task of forecasting the temperatures, two models were trained one for forecasting minimum temperatures (FMiT) and the other for the maximum temperatures (FMaT). Several Models were tested including time series techniques and deep learining model LSTM.The FMiT performed better than the FMaT with error of ± 1.3 degrees and ± 2.14 degrees respectively (within one standard deviation). The selected model for FMiT is ReducedRegressionForecaster using liner regression, which baiscally reduce the forecasting task to a simpler and related task of tabular regression. The selected model for FMaT is Exponential Smoothing technique, which is simply, smoothing time series data using an exponential window function. An important note about model selection for FMaT that is although the LSTM model sometimes is better by 0.4 degree, it is unstable each time you train because of the initialization of the weight.

The Main Takeaways:

1. The average temperatures by same months has an downward trend for January, February, March and April (0.25 - 1 Cْ)

2. The average temperatures by same months has an upward trend for May, June, July, August, October, November and December (0.25 - 1.75 Cْ)